What is the best, most efficient, Client pool technique with Erlang - sockets

I'm a real Erlang newbie (started 1 week ago), and I'm trying to learn this language by creating a small but efficient chat server. (When I say efficient I mean I have 5 servers used to stress test this with hundreds of thousands connected client - A million would be great !)
I have find some tutorials doing so, the only thing is, that every tutorial i found, are IRC like. If one user send a message, all user except sender will receive it.
I would like to change that a bit, and use one-to-one discussion.
What would be the most effective client pool for searching a connected user ?
I thought about registering the process, because it seems to do everything I need, but I really don't think this is the better way to do it. (Or most pretty way to do it anyway).
Does anyone would have any suggestions doing this ?
EDIT :
Every connected client is affected to an ID.
When the user is connected, it first send a login command to give it's id.
When an user wants to send a message to another one the message looks like this
[ID-NUMBER][Message] %% ID-NUMBER IS A FIXED LENGTH
When I ask for "the most effective client pool", I'm actually looking for the fastest way to retrieve/add/delete one client on the connected client list which could potentially be large (hundred of thousands -- maybe millions)
EDIT 2 :
For answering some questions :
I'm using Raw Socket (Using telnet right now to communicate with server) - will probably move to ssl later...
It is my own protocol
Every Client is a spawned Pid
Every Client's Pid is linked to it's own monitor (mostly for debugging reason - The client if disconnected should reconnect by it's own starting auth from scratch)
I have read a couple a book before starting coding, So I do not master yet every aspect of Erlang but I'm not unaware of it, I will read more about it when needed I guess.
What I'm really looking for is the best way to store and search thoses PIDs to send message directly from process to process.
Should I write my own search Client function using lists ?
or should I use ets ?
Or even use register/2 unregister/1 and whereis/1 to maintain my client list, using it's unique id as atom, it seems to be the simplest way to do so, I really don't know if it is efficient, but I'm pretty sure this is the ugly solution ;-) ?

I'm doing something similar to your chat program using gproc as a pubsub (similar to the demo on that page). Each client registers as it's id. To find a particular client, you do a lookup on that client id. To subscribe to a client, you add a property to that process of the client id being subscribed to. To publish, you call gproc:send(ClientId,Message). This covers your use case, the more general room based chat as well, and can handle distributed masterless registry of processes.
I haven't tested to see if it scales to millions, but it uses ets to do the storage and gproc is rock solid code by Ulf Wiger. I wouldn't count on being able to write a better implementation.

I'm also kind of new to Erlang (a couple of months), so I hope this can put you in the correct path :)
First of all, since you're a "newbie", you should know about these sites:
Erlang Official Documentation:
Most common modules are in the stdlib application, so start from there.
Alternative Documentation:
There's a real time search engine, so it is really good when searching
for specific modules.
Erlang Programming Rules:
For you to enter in the mindset of erlang programming.
Learn You Some Erlang Book:
A must read for everyone starting with Erlang. It's really comprehensive
and fun to read!
Trapexit.org:
Forum and cookbooks, to search for common problems faced by programmers.
Well, thinking about a non persistent database, I would suggest the sets or gb_sets modules (documentation here).
If you want persistence, you should try dets (see documentation above), but I can't state anything about efficiency, so you should research this topic a bit further.
In the book Learn You Some Erlang there is a chapter on data structures that says that sets are better for read intensive systems, while gb_sets is more appropriate for a balanced usage.

Now, Messaging systems are what everyone wants to do when they come to Erlang because the two naturally blend. However, there are a number of things to look into before one continues. Messaging basically involves the following things: User Registration, User Authentication, Sessions Management,Logging, Message Switching/routing e.t.c. Now, to do all or most of these, one needs to have a Database, certainly IN-MEMORY, thats leads me to either Mnesia or ETS Tables. Since you are new to Erlang, i suppose you have not yet really mastered working with these. At one moment, you will need to maintain Who is communicating with who, Who is available for Chat e.t.c. Hence you might need to look up things and write things some where.Another thing is you have not told us the Client. Is it going to be a Web Client (HTTP), is it an entirely new protocol you are implementing over raw Sockets ? Which ever way, you will need to master something called: Concurrency in Erlang. If a user connects and is assigned an ID, if your design is A process Per User, then you will have to save the Pids of these Processes or register them against some criteria, yet again monitor them if they die e.t.c. Which brings me to OTP and Supervision trees. There is quite alot, however, tell us more about the Client and Server interaction, the Network Communication you need e.t.c. Or is it just a simple Erlang RPC project you are doing for your own revision ?
EDIT Use ETS Tables, or use Mnesia RAM tables. Do not think of registering these Pids or Storing them in a list, Array or set. Look at this solution which was given to this question

Related

What is a good ZeroMQ / nanomsg architecture for a server that sends data to clients that can connect / disconnect?

I am trying to create a network architecture which has a single server and multiple clients. The clients can connect and disconnect at any time so they need to announce their existence or shut-down to the server. The server must be able to send data to any particular client.
What is the best scalability protocols/architecture to use for this?
Currently I use a REQ/REP so that clients can 'login' and 'logout', and a SURVEY socket so that the server can send data to all clients. The message sent has an ID for the particular client it wants to process the message.
Is this good, or is there something better?
Sounds more like you need publisher subscriber. With both 0MQ and nanomsg you don't need to do anything in particular to manage connection / disconnection, the library does that for you.
However if you want more sophisticated message management (such as caching outgoing messages just in case another client chooses to connect) then you will have to manage that yourself. You might use a single push pull from the clients for them to announce their presence (they'd send a message saying who they were), followed by more push pulls from the server to each of the clients to send the messages from the cache that you have. Fiddly, but still a whole lot easier than programming up with raw sockets.
Using req rep can be problematic - if either end crashes or unexpectedly disconnects the other can be left in a stalled, unrecoverable state.
Sorry, in real world, there is no "One Size Fits All"
There are many further aspects, that influence architecture - The Bigger Picture.
While both Martin SUSTRIK's cool kids -- ZeroMQ & nanomsg -- have made a gread help in providing excellend bases + LEGO-type building blocks of Scaleable Formal Communication Patterns, they are only the beginning and saying that REQ/REP or SURVEY Behavioural Primitives ( great innovation, nevertheless still a building block ) are The Architecture would get upset almost all architects and evangelists.
The original question is important, however you already have gotten a first proposal to get it administratively closed as some people with "wider wingspan" feel your question is "too wide" or "opinion"-oriented instead an MCVE code-example driven ( ... yes, StackOverflow life is sometimes fast and cruel ).
So, without any further details available, my recommendation would be to check recent versions of PUB/SUB ( which can and do filter on PUB-side ( not on the SUB as was the design in the early ZeroMQ versions, once already xmited / delivered zillions of bytes round the world to just realise on the globally distributed peers level that no-one has yet SUB-ed to receive anything of that ) instead of the mentioned SURVEY.
Without any context it is nonsense to seriously judge, the less to improve what you try to design and implement.
I would do a poor service I were trying to do so.
The best next step?
What I can do for you right now is to direct you to see a bigger picture on this subject >>> with more arguments, a simple signalling-plane / messaging-plane illustration and a direct link to a must-read book from Pieter HINTJENS.

3-way communication via sockets

Good Afternoon Gurus,
I am pretty familiar with basic socket programming, and the IO::Socket module but I need to code something now that I have not encountered before. It will be a 3 tier application. The first tier is an event-loop that sends messages upstream when certain events are encountered. The second tier is the 'middle-ware' server, which (among other things) acts as the message repository. The third tier is a cgi application, which will update a graphical display.
I am confused on how to set up the server to accept uni-directional connections from multiple clients one one side, and communicate bi-directionally with the cgi application on the other. I can do either of those tasks separately, just not in the same script (yet). Does my question make sense? I would like to stick with using the IO::Socket module, but it is not a requirement by any means. I am not asking for polished code, just advice on setting up the socket(s) and how to communicate from one client to another via the server.
Also, does it make more sense to have the cgi application query the server for new messages, or have the server push the new message upstream to the cgi application? The graphical updates need to be near real-time.
Thank you in advance,
Daren
You said you already have an event loop in the first tier. In a way, your second-tier server should also arrange some kind of event loop for asynchronous processing. There are many ways to code it using perl, like AnyEvent, POE, Event to name just a few. In the end, they all use one of select, poll, epoll, kqueue OS facilities (or their equivalent on Windows). If you feel comfortable coding in a relatively low-level, you can just use perl's select builtin, or, alternatively, its object-oriented counterpart, IO::Select.
Basically you create two listening sockets (you might only need one if the first tier uses the same communication protocol as the third tier to talk to your server), add it to the IO::Select object and do a select on it. Once the connection
is made, you add the accepted sockets to the select object.
The select method of IO::Select will give you back a list of sockets ready for reading or writing (I am ignoring the possibility of exceptions here). Of course you have to keep track of your sockets to know which one is which. Also, the communication logic will be somewhat complicated because you have to use non-blocking sockets.
As for the second part of your question, I am a little bit confused what you mean by "cgi" - whether it is a Common Gateway Interface (i.e., server-side web scripts), or whether it is a shorthand for "computer graphics". In both cases I think that it makes sense for your task to use server push.
In the latter case that's all I'd like to say. In the former case, I suggest you google for "Comet" (as in "AJAX"). :-)
In a standard CGI application, I don't see how you can "push" data to them. For a client interaction, the data goes through the CGI/presentation layer to the middle tier to remain in session storage (or cache) or to the backend to get stored in the database.
That is of course unless you have a thick application layer which is a caching locus and kind of a middle tier in itself.

How to maintain a persistant network-connection between two applications over a network?

I was recently approached by my management with an interesting problem - where I am pretty sure I am telling my bosses the correct information but I really want to make sure I am telling them the correct stuff.
I am being asked to develop some software that has this function:
An application at one location is constantly processing real-time data every second and only generates data if the underlying data has changed in any way.
On the event that the data has changed send the results to another box over a network
Maintains a persistent connection between the both machines, altering the remote box if for some reason the network connection went down
From what I understand, I imagine that I need to do some reading on doing some sort of TCP/IP socket-level stuff. That way if the connection is dropped the remote location will be aware that the data it has received may be stale.
However management seems to be very convinced that this can be accomplished using SOAP. I was under the impression that SOAP is more or less a way for a client to initiate a procedure from a server and get some results via the HTTP protocol. Am I wrong in assuming this? I haven't been able to find much information on how SOAP might be able to solve a problem like this.
I feel like a lot of people around my office are using SOAP as a buzzword and that has generated a bit of confusion over what SOAP actually is - and is capable of.
Any thoughts on how to accomplish this task would be appreciated!
I think SOAP is the wrong tool. SOAP is a spec for exchanging structured data. For your problem, the simplest thing would be to write a program to just transfer data and figure out if the other end is alive. Sockets are a good way to go. There are lots of socket programming tutorials on the net. Pick your language, and ask Mr. Google. Write a couple of demo programs to teach yourself how it works. Ask if you have more specific questions.
For the problem, you'll need a sender and a receiver. The sender sends data when it gets it, the receiver waits for data and hands it off when it arrives. Get that working first. Next, add in heartbeats; a message that says "I'm alive", sent periodically. Get that working next. You'll need to be determine the exact behavior you want -- should both sides send heartbeats to the other end, the maximum time you are willing to wait for a heartbeat, and what action you take should heartbeats stop arriving. The network connection can drop, the other end can crash, the other end can hang, and perhaps there are other conditions you should think about (e.g., what if the real time data is nonsense?). Figure out how to handle each condition, and code up the error handling. Test it out, and serve with a side of documentation.
SOAP certainly won't tell you when the data source goes down, though you could use "heartbeats" to add that.
Probably you are right and they are just repeating a buzz word, and don't actually know much about what SOAP is or does or have any real argument for why it ought to be used here.

What's the best IPC mechanism for medium-sized data in Perl? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm working on designing a multi-tiered app in Perl and I'm wondering about the pros and cons of the various IPC mechnisms available to me. I'm looking at handling moderately-sized data, typically a few dozen kilobytes but up to a couple of megabytes, and the load is pretty light, at most a couple of hundred requests per minute.
My primary concerns are maintainability and performance (in that order). I don't think I'll need to scale up to more than one server, or port off of our main platform (RHEL), but I suppose it's something to consider.
I can think of the following options:
Temporary files - Simplistic, probably the worst option in terms of speed and storage requirements
UNIX domain sockets - Not portable, not scalable
Internet Sockets - Portable, scalable
Pipes - Portable, not scalable (?)
Considering that scalability and portability are not my primary concerns, I need to learn more. What's the best choice, and why? Please comment if you need additional information.
EDIT: I'll try to give more detail in response to ysth's questions (warning, wall of text follows):
Are readers/writers in a one-to-one relationship, or something more more complicated?
What do you want to happen to the writer if the reader is no longer there or busy?
And vice versa?
What other information do you have about your desired usage?
At this point, I'm contemplating a three-tiered approach, but I'm not sure how many processes I'll have in each tier. I think I need to have more processes towards the left side and fewer toward the right, but maybe I should have the same number across the board:
.---------. .----------. .-------.
| Request | -----> | Business | -----> | Data |
| Manager | <----- | Logic | <----- | Layer |
`---------' `----------' `-------'
These names are still generic and probably won't make it into the implementation in these forms.
The request manager is responsible for listening for requests from different interfaces, for example web requests and CLI (where response time is important) and e-mail (where response time is less important). It performs logging and manages the responses to the requests (which are rendered in a format appropriate to the type of request).
It sends data about the request to the business logic which performs logging, authorization depending on business rules, etc.
The business logic (if it needs to) then requests data from the data layer, which can either talk to (most often) the internal MySQL database or some other data source outside our team's control (e.g., our organization's primary LDAP servers, or our DB2 employee information database, etc.). This is mostly simply a wrapper which formats the data in a uniform way so that it can be handled more easily in the business logic.
The information then flows back to to the request manager for presentation.
If, when data is flowing to the right, the reader is busy, for the interactive requests I'd like to simply wait a suitable period of time, and return a timeout error if I don't get access in that amount of time (e.g. "Try again later"). For the non-interactive requests (e.g. e-mail), the polling system can simply exit and try again on the next invocation (which will probably be once per 1-3 minutes).
When data is flowing in the other direction, there shouldn't be any waiting situations. If one of the processes has died when trying to travel back to the left, all I can really do is log and exit.
Anyway, that was pretty verbose, and since I'm still in early design I probably still have some confused ideas in there. Some of what I've mentioned is probably tangential to the issue of which IPC system to use. I'm open to other suggestions on the design, but I was trying to keep the question limited in scope (For example, maybe I should consider collapsing down to two tiers, which is a much simpler for IPC). What are your thoughts?
If you're unsure about your exact requirements at the moment, try to think of a simple interface that you can code to, that any IPC implementation (be it temporary files, TCP/IP or whatever) needs to support. You can then choose a particular IPC flavour (I would start with whatever's easiest and/or easiest to debug -- probably temporary files) and implement the interface using that. If that turns out to be too slow, implement the interface using e.g. TCP/IP. Actually implementing the interface does not involve much work as you will essentially just be forwarding calls to some existing library.
The point is that you have a high-level task to perform ("transmit data from program A to program B") which is more or less independent of the details of how it is performed. By establishing an interface and coding to it, you isolate the main program from changes in the event that you need to change the implementation.
Note that you don't need to use any heavyweight Perl language mechanisms to capitalise on the idea of having an interface. You could simply have e.g. 3 different packages (for temp files, TCP/IP, Unix domain sockets), each of which exports the same set of methods. Choosing which implementation you want to use in your main program amounts to choosing which module to use.
Temporary files (and related things, like a shared memory region), are probably a bad bet. If you ever want to run your server on one machine and your clients on another, you will need to rewrite your application. If you pick any of the other options, at least the semantics are the essentially the same, if you need to switch between them at a later date.
My only real advice, though, is to not write this yourself. On the server side, you should use POE (or Coro, etc.), rather than doing select on the socket yourself. Also, if your interface is going to be RPC-ish, use something like JSON-RPC-Common/ from the CPAN.
Finally, there is IPC::PubSub, which might work for you.
Temporary files have other problems besides that. I think Internet socks are really the best choice. They are well documented, and as you say, scalable and portable. Even if that is not a core requirement, you get it nearly for free. Sockets are pretty easy to deal with, again there is copious amounts of documentation. You can build out your data sharing mechanism and protocol out in a library and never have to look at it again!
UNIX domain sockets are portable across unices. It's no less portable than pipes. It's also more efficient than IP sockets.
Anyway, you missed a few options, shared memory for example. Some would add databases to that list but I'd say that's a rather heavyweight solution.
Message queues would also be a possibility, though you'd have to change a kernel option for it to handle such large messages. Otherwise, they have an ideal interface for a lot of things, and IMHO they are greatly underused.
I generally agree though that using an existing solution is better than building somethings of your own. I don't know the specifics of your problem, but I'd suggest you'd check out the IPC section of CPAN
There are so many different options because most of them are better for some particular case, but you haven't really given any information that would identify your case.
Are readers/writers in a one-to-one relationship, or something more more complicated?
What do you want to happen to the writer if the reader is no longer there or busy? And vice versa?
What other information do you have about your desired usage?
For "interactive" requests (holding the connection open while waiting for a response (asynchronously or not): HTTP + JSON. JSON::XS is insanely fast. Everyone and everything can speak HTTP and it's easy to load balance, debug, ...
For queued requests ("please do this, thanks!"): Beanstalkd and Beanstalk::Client. Serialize the requests in the beanstalk queue with JSON.
Thrift might also be worth looking into depending on your application.

Deciphering MMORPG Protocol Encoding

I plan on writing an automated bot for a game.
The tricky part is figuring out how they encoded their protocol... To make the bot run around is easy, simply make the character run and record what it does in wireshark. However, interpreting the environment is more difficult... It recieves about 5 packets each second if you are idle, hence lots of garbarge.
My plan: Because the game runs under TCP, I will use freecap (http://www.freecap.ru/eng) to force the game to connect to a proxy running on my machine. I will need this proxy to be capable of packet injection, or perhaps a server that is capable of resending captured packets. This way I can recreate and tinker around with what the server sends, and understand their protocol encoding.
Does anyone know where I can get a proxy that allows packet injection or where I can perform packet injection (not via hardware, as is the case with wireless or anything!)
Where of if I can find a server/proxy that resends captured packets (ie: replays a connection).
Any better tools or methodologies for pattern matching? Something which can highlight patterns from mutliple messages would be GREAT.
OR, is there a better way to decipher this here? Possibly a dissasembly strategy (via hooking a winsock function and starting the dissassembly from there) ? I have not done this before so I am not sure. OR , any other ideas?
Network traffic interception and protocol analysis is generally a less favored method to accomplish your goal here. For most modern games, encryption is a serious factor, and there are serious headaches associated with the protocol analysis for any but trivial factors of the most common gameplay scenarios.
Most modern implementations* of what you are trying to do rely on reading and manipulating the memory space and process of a running client. The client will have already done all the hard parts for you, including decrypting the traffic and sorting it into far more easy to read data structures. For interacting with the server you can call functions built into the client instead of crafting entire series of packets from scratch. The plus to this approach is that you have to do far less work to interpret the data and produce activity. The minus is that there is often some data in the network traffic that would be useful to a bot but is discarded by the client, or that you may want to send traffic to the server that the client cannot produce (which, in my own well-developed hierarchy for such, is a few steps farther down the "cheating" slope).
*...I say this having seen the evolution of the majority of MMORPG botting/hacking communities from network protocol analyzers like ShowEQ and Odin's Eye / Excalibur to memory-based applications like MacroQuest and InnerSpace. On that note, InnerSpace provides an excellent extensible framework for the memory/process-based variant of what you are attempting, and you should look into it as a basis for your project if you abandon the network analysis approach.
As I've done a few game bots in the past (for fun, not profit or griefing of course - writing game bots is a lot of fun), I recommend the following:
If you can code and there isn't cheat protection preventing you from doing it, I highly recommend writing an injected DLL for the following reasons:
Your DLL will be able to access the game's memory space directly, and once you reverse-engineer the data structures (either by poking around memory or by code disassembly), you'll have access to lots of data. This will also allow you to bypass any network encryption the game may have. The downside of accessing process memory directly is that offsets and data structures change between versions - however, data structures don't change very often with a stable game, and you can compensate offset changes by searching for code patterns instead of using fixed offsets.
Either way, you'll still be able to hook WinSock functions using API hooks (check out Microsoft Detours and the excellent but now-commercial madCodeHook).
otherwise, I can only advise that you give live/interactive packet editors like WPE Pro a try.
In most scenarios, the coolest methods (code reverse-engineering and direct memory access) tend to be the least productive. They require a lot of skill (to understand the code) and time, both initially (to go through all the code and develop code to interact with the data structure) and for maintainance (in case the game is being updated). (Of course, they sometimes do allow doing cool stuff which is impossible to do with the official client, but most of the time this is obvious as blatant cheating, and likely to attract the GMs quickly). Most of the time bots are made by replacing game graphics/textures with solid colours, and creating simple "pixel" bots which search for certain colours on the screen and react accordingly (e.g. click them).
Hope this helps, and remember - cheating is only fun when it doesn't make the game less fun for everyone else ;)
There are probably a few reasonable assumptions you can make that should simplify your task enormously. However, to make the best use of them you will probably need greater comfort with sleeves-rolled-up programming than it sounds like you have.
First, it's a safe bet that the encryption they are using falls into one of three categories:
None
Cheesy
Far better than you are likely to crack
With the odds of the middle case being very low.
Next, it's a safe bet that the packets are encrypted / decrypted close to the edge of the program (right as they come in, right before they go out) and that the body of the game deals with them in decrypted form.
Finally, the protocol they are using most likely consists of either
ascii with data blocks
binary goo
So do a little packet sniffing with a card set in promiscuous mode for unencrypted ascii. If you see some, great, you're ahead of the game. But if you don't give up the whole tapping-the-line idea and instead start following the code as it returns from the sending data out by breakpointing and stepping with a debugger. Figure the outermost layer or three will be standard network stuff, then will come the encryption layer, and beyond that the huge mass of stuff that deals with the protocol unencrypted.
You should be able to get this far in an hour if you're hot, a weekend if you're reasonably skilled, motivated, and diligent, and never if you are hopeless. But it is possible in principle (and doubtlessly far easier in practice) to do it this way.
Once you get to where something that looks like unencrypted goo comes in, gets mungled, and the mungled form goes out, then start worrying about what it means.
-- MarkusQ
A) I play a MMO and do not support bots, voting down...
B) Download Backtrack v.3, run an arpspoof on your default gateway and your host. There is an application that will spoof the remote host's SSL cert sslmitm (I believe is the name) which will then allow you to create a full connection through your host. Then fireup tcpdump/ethereal/wireshark (choose your pcap poison) and move around do random stuff to find out what packet is doing what. That will be your biggest challenge; but proxying with a Man in the Middle attack on yourself is the way to go.
C) I do not condone this activity, this information is only being provided as free information.
Sounds like there is not encryption going on, so you could do a network approach.
A great place to start would be to find the packet ID's - most of the time, something near the front of the packet is going to be an ID of the type of the packet. For example move could be 1, shoot fired could be "2", chat could be "4".
You can write your own proxy that listens on one port for your game to connect, and then connects to the server. You can make keypresses to your proxy fire off commands, or you can make your proxy write out debugging info to help you go further.
(I've written a bot for an online in game in PHP - of all things.)