Golang - Scaling a websocket client for multiple connections to different servers - sockets

I have a websocket client. In reality, it is far more complex than the basic code shown below.
I now need to scale this client code to open connections to multiple servers. Ultimately, the tasks that need to be performed when a message is received from the servers is identical.
What would be the best approach to handle this?
As I said above the actual code performed when receiving the message is far more complex than shown in the example.
package main
import (
"flag"
"log"
"net/url"
"os"
"os/signal"
"time"
"github.com/gorilla/websocket"
)
var addr = flag.String("addr", "localhost:1234", "http service address")
func main() {
flag.Parse()
log.SetFlags(0)
interrupt := make(chan os.Signal, 1)
signal.Notify(interrupt, os.Interrupt)
// u := url.URL{Scheme: "ws", Host: *addr, Path: "/echo"}
u := url.URL{Scheme: "ws", Host: *addr, Path: "/"}
log.Printf("connecting to %s", u.String())
c, _, err := websocket.DefaultDialer.Dial(u.String(), nil)
if err != nil {
log.Fatal("dial:", err)
}
defer c.Close()
done := make(chan struct{})
go func() {
defer close(done)
for {
_, message, err := c.ReadMessage()
if err != nil {
log.Println("read:", err)
return
}
log.Printf("recv: %s", message)
}
}()
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
select {
case <-done:
return
case t := <-ticker.C:
err := c.WriteMessage(websocket.TextMessage, []byte(t.String()))
if err != nil {
log.Println("write:", err)
return
}
case <-interrupt:
log.Println("interrupt")
// Cleanly close the connection by sending a close message and then
// waiting (with timeout) for the server to close the connection.
err := c.WriteMessage(websocket.CloseMessage, websocket.FormatCloseMessage(websocket.CloseNormalClosure, ""))
if err != nil {
log.Println("write close:", err)
return
}
select {
case <-done:
case <-time.After(time.Second):
}
return
}
}
}

Modify the interrupt handling to close a channel on interrupt. This allows multiple goroutines to wait on the event by waiting for the channel to close.
shutdown := make(chan struct{})
interrupt := make(chan os.Signal, 1)
signal.Notify(interrupt, os.Interrupt)
go func() {
<-interrupt
log.Println("interrupt")
close(shutdown)
}()
Move the per-connection code to a function. This code is a copy and paste from the question with two changes: the interrupt channel is replaced with the shutdown channel; the function notifies a sync.WaitGroup when the function is done.
func connect(u string, shutdown chan struct{}, wg *sync.WaitGroup) {
defer wg.Done()
log.Printf("connecting to %s", u)
c, _, err := websocket.DefaultDialer.Dial(u, nil)
if err != nil {
log.Fatal("dial:", err)
}
defer c.Close()
done := make(chan struct{})
go func() {
defer close(done)
for {
_, message, err := c.ReadMessage()
if err != nil {
log.Println("read:", err)
return
}
log.Printf("recv: %s", message)
}
}()
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
select {
case <-done:
return
case t := <-ticker.C:
err := c.WriteMessage(websocket.TextMessage, []byte(t.String()))
if err != nil {
log.Println("write:", err)
return
}
case <-shutdown:
// Cleanly close the connection by sending a close message and then
// waiting (with timeout) for the server to close the connection.
err := c.WriteMessage(websocket.CloseMessage, websocket.FormatCloseMessage(websocket.CloseNormalClosure, ""))
if err != nil {
log.Println("write close:", err)
return
}
select {
case <-done:
case <-time.After(time.Second):
}
return
}
}
}
Declare a sync.WaitGroup in main(). For each websocket endpoint that you want to connect to, increment the WaitGroup and start a goroutine to connect that endpoint. After starting the goroutines, wait on the WaitGroup for the goroutines to complete.
var wg sync.WaitGroup
for _, u := range endpoints { // endpoints is []string
// where elements are URLs
// of endpoints to connect to.
wg.Add(1)
go connect(u, shutdown, &wg)
}
wg.Wait()
The code above with an edit to make it run against Gorilla's echo example server is posted on the playground.

is the communication with every different server completely independendant of the other servers? if yes i would go around in a fashion like:
in main create a context with a cancellation function
create a waitgroup in main to track fired up goroutines
for every server, add to the waitgroup, fire up a new goroutine from the main function passing the context and the waitgroup references
main goes in a for/select loop listening to for signals and if one arrives calls the cancelfunc and waits on the waitgroup.
main can also listen on a result chan from the goroutines and maybe print the results itself it the goroutines shouldn't do it directly.
every goroutine has as we said has references for the wg, the context and possibly a chan to return results. now the approach splits on if the goroutine must do one and one thing only, or if it needs to do a sequence of things. for the first approach
if only one thing is to be done we follow an approach like the one descripbed here (observe that to be asyncronous he would in turn fire up a new goroutine to perform the DoSomething() step that would return the result on the channel)
That allows it to be able to accept the cancellation signal at any time. it is up to you to determine how non-blocking you want to be and how prompt you want to be to respond to cancellation signals.Also the benefit of having the a context associated being passed to the goroutines is that you can call the Context enabled versions of most library functions. For example if you want your dials to have a timeout of let's say 1 minute, you would create a new context with timeout from the one passed and then DialContext with that. This allows the dial to stop both from a timeout or the parent (the one you created in main) context's cancelfunc being called.
if more things need to be done ,i usually prefer to do one thing with the goroutine, have it invoke a new one with the next step to be performed (passing all the references down the pipeline) and exit.
this approach scales well with cancellations and being able to stop the pipeline at any step as well as support contexts with dealines easily for steps that can take too long.

Related

pg-go RunInTransaction not rolling back the transaction

I'm trying to rollback a transaction on my unit tests, between scenarios, to keep the database empty and do not make my tests dirty. So, I'm trying:
for _, test := range tests {
db := connect()
_ = db.RunInTransaction(func() error {
t.Run(test.name, func(t *testing.T) {
for _, r := range test.objToAdd {
err := db.PutObj(&r)
require.NoError(t, err)
}
objReturned, err := db.GetObjsWithFieldEqualsXPTO()
require.NoError(t, err)
require.Equal(t, test.queryResultSize, len(objReturned))
})
return fmt.Errorf("returning error to clean up the database rolling back the transaction")
})
}
I was expecting to rollback the transaction on the end of the scenario, so the next for step will have an empty database, but when I run, the data is never been rolling back.
I believe I'm trying to do what the doc suggested: https://pg.uptrace.dev/faq/#how-to-test-mock-database, am I right?
More info: I notice that my interface is implementing a layer over RunInTransaction as:
func (gs *DB) RunInTransaction(fn func() error) error {
f := func(*pg.Tx) error { return fn() }
return gs.pgDB.RunInTransaction(f)
}
IDK what is the problem yet, but I really guess that is something related to that (because the TX is encapsulated just inside the RunInTransaction implementation.
go-pg uses connection pooling (in common with most go database packages). This means that when you call a database function (e.g. db.Exec) it will grab a connection from the pool (establishing a new one if needed), run the command and return the connection to the pool.
When running a transaction you need to run BEGIN, whatever updates etc you require, followed by COMMIT/ROLLBACK, on a single connection dedicated to the transaction (any commands sent on other connections are not part of the transaction). This is why Begin() (and effectively RunInTransaction) provide you with a pg.Tx; use this to run commands within the transaction.
example_test.go provides an example covering the usage of RunInTransaction:
incrInTx := func(db *pg.DB) error {
// Transaction is automatically rollbacked on error.
return db.RunInTransaction(func(tx *pg.Tx) error {
var counter int
_, err := tx.QueryOne(
pg.Scan(&counter), `SELECT counter FROM tx_test FOR UPDATE`)
if err != nil {
return err
}
counter++
_, err = tx.Exec(`UPDATE tx_test SET counter = ?`, counter)
return err
})
}
You will note that this only uses the pg.DB when calling RunInTransaction; all database operations use the transaction tx (a pg.Tx). tx.QueryOne will be run within the transaction; if you ran db.QueryOne then that would be run outside of the transaction.
So RunInTransaction begins a transaction and passes the relevant Tx in as a parameter to the function you provide. You wrap this with:
func (gs *DB) RunInTransaction(fn func() error) error {
f := func(*pg.Tx) error { return fn() }
return gs.pgDB.RunInTransaction(f)
}
This effectively ignores the pg.Tx and you then run commands using other connections (e.g. err := db.PutObj(&r)) (i.e. outside of the transaction). To fix this you need to use the transaction (e.g. err := tx.PutObj(&r)).

Write on a closed net.Conn but returned nil error

Talk is cheap, so here we go the simple code:
package main
import (
"fmt"
"time"
"net"
)
func main() {
addr := "127.0.0.1:8999"
// Server
go func() {
tcpaddr, err := net.ResolveTCPAddr("tcp4", addr)
if err != nil {
panic(err)
}
listen, err := net.ListenTCP("tcp", tcpaddr)
if err != nil {
panic(err)
}
for {
if conn, err := listen.Accept(); err != nil {
panic(err)
} else if conn != nil {
go func(conn net.Conn) {
buffer := make([]byte, 1024)
n, err := conn.Read(buffer)
if err != nil {
fmt.Println(err)
} else {
fmt.Println(">", string(buffer[0 : n]))
}
conn.Close()
}(conn)
}
}
}()
time.Sleep(time.Second)
// Client
if conn, err := net.Dial("tcp", addr); err == nil {
for i := 0; i < 2; i++ {
_, err := conn.Write([]byte("hello"))
if err != nil {
fmt.Println(err)
conn.Close()
break
} else {
fmt.Println("ok")
}
// sleep 10 seconds and re-send
time.Sleep(10*time.Second)
}
} else {
panic(err)
}
}
Ouput:
> hello
ok
ok
The Client writes to the Server twice. After the first read, the Server closes the connection immediately, but the Client sleeps 10 seconds and then re-writes to the Server with the same already closed connection object(conn).
Why can the second write succeed (returned error is nil)?
Can anyone help?
PS:
In order to check if the buffering feature of the system affects the result of the second write, I edited the Client like this, but it still succeeds:
// Client
if conn, err := net.Dial("tcp", addr); err == nil {
_, err := conn.Write([]byte("hello"))
if err != nil {
fmt.Println(err)
conn.Close()
return
} else {
fmt.Println("ok")
}
// sleep 10 seconds and re-send
time.Sleep(10*time.Second)
b := make([]byte, 400000)
for i := range b {
b[i] = 'x'
}
n, err := conn.Write(b)
if err != nil {
fmt.Println(err)
conn.Close()
return
} else {
fmt.Println("ok", n)
}
// sleep 10 seconds and re-send
time.Sleep(10*time.Second)
} else {
panic(err)
}
And here is the screenshot:
attachment
There are several problems with your approach.
Sort-of a preface
The first one is that you do not wait for the server goroutine
to complete.
In Go, once main() exits for whatever reason,
all the other goroutines still running, if any, are simply
teared down forcibly.
You're trying to "synchronize" things using timers,
but this only works in toy situations, and even then it
does so only from time to time.
Hence let's fix your code first:
package main
import (
"fmt"
"log"
"net"
"time"
)
func main() {
addr := "127.0.0.1:8999"
tcpaddr, err := net.ResolveTCPAddr("tcp4", addr)
if err != nil {
log.Fatal(err)
}
listener, err := net.ListenTCP("tcp", tcpaddr)
if err != nil {
log.Fatal(err)
}
// Server
done := make(chan error)
go func(listener net.Listener, done chan<- error) {
for {
conn, err := listener.Accept()
if err != nil {
done <- err
return
}
go func(conn net.Conn) {
var buffer [1024]byte
n, err := conn.Read(buffer[:])
if err != nil {
log.Println(err)
} else {
log.Println(">", string(buffer[0:n]))
}
if err := conn.Close(); err != nil {
log.Println("error closing server conn:", err)
}
}(conn)
}
}(listener, done)
// Client
conn, err := net.Dial("tcp", addr)
if err != nil {
log.Fatal(err)
}
for i := 0; i < 2; i++ {
_, err := conn.Write([]byte("hello"))
if err != nil {
log.Println(err)
err = conn.Close()
if err != nil {
log.Println("error closing client conn:", err)
}
break
}
fmt.Println("ok")
time.Sleep(2 * time.Second)
}
// Shut the server down and wait for it to report back
err = listener.Close()
if err != nil {
log.Fatal("error closing listener:", err)
}
err = <-done
if err != nil {
log.Println("server returned:", err)
}
}
I've spilled a couple of minor fixes
like using log.Fatal (which is
log.Print + os.Exit(1)) instead of panicking,
removed useless else clauses to adhere to the coding standard of keeping the main
flow where it belongs, and lowered the client's timeout.
I have also added checking for possible errors Close on sockets may return.
The interesting part is that we now properly shut the server down by closing the listener and then waiting for the server goroutine to report back (unfortunately Go does not return an error of a custom type from net.Listener.Accept in this case so we can't really check that Accept exited because we've closed the listener).
Anyway, our goroutines are now properly synchronized, and there is
no undefined behaviour, so we can reason about how the code works.
Remaining problems
Some problems still remain.
The more glaring is you making wrong assumption that TCP preserves
message boundaries—that is, if you write "hello" to the client
end of the socket, the server reads back "hello".
This is not true: TCP considers both ends of the connection
as producing and consuming opaque streams of bytes.
This means, when the client writes "hello", the client's
TCP stack is free to deliver "he" and postpone sending "llo",
and the server's stack is free to yield "hell" to the read
call on the socket and only return "o" (and possibly some other
data) in a later read.
So, to make the code "real" you'd need to somehow introduce these
message boundaries into the protocol above TCP.
In this particular case the simplest approach would be either
using "messages" consisting of a fixed-length and agreed-upon
endianness prefix indicating the length of the following
data and then the string data itself.
The server would then use a sequence like
var msg [4100]byte
_, err := io.ReadFull(sock, msg[:4])
if err != nil { ... }
mlen := int(binary.BigEndian.Uint32(msg[:4]))
if mlen < 0 {
// handle error
}
if mlen == 0 {
// empty message; goto 1
}
_, err = io.ReadFull(sock, msg[5:5+mlen])
if err != nil { ... }
s := string(msg[5:5+mlen])
Another approach is to agree on that the messages do not contain
newlines and terminate each message with a newline
(ASCII LF, \n, 0x0a).
The server side would then use something like
a usual bufio.Scanner loop to get
full lines from the socket.
The remaining problem with your approach is to not dealing with
what Read on a socket returns: note that io.Reader.Read
(that's what sockets implement, among other things) is allowed
to return an error while having had read some data from the
underlying stream. In your toy example this might rightfully
be unimportant, but suppose that you're writing a wget-like
tool which is able to resume downloading of a file: even if
reading from the server returned some data and an error, you
have to deal with that returned chunk first and only then
handle the error.
Back to the problem at hand
The problem presented in the question, I beleive, happens simply because in your setup you hit some TCP buffering problem due to the tiny length of your messages.
On my box which runs Linux 4.9/amd64 two things reliably "fix"
the problem:
Sending messages of 4000 bytes in length: the second call
to Write "sees" the problem immediately.
Doing more Write calls.
For the former, try something like
msg := make([]byte, 4000)
for i := range msg {
msg[i] = 'x'
}
for {
_, err := conn.Write(msg)
...
and for the latter—something like
for {
_, err := conn.Write([]byte("hello"))
...
fmt.Println("ok")
time.Sleep(time.Second / 2)
}
(it's sensible to lower the pause between sending stuff in
both cases).
It's interesting to note that the former example hits the
write: connection reset by peer (ECONNRESET in POSIX)
error while the second one hits write: broken pipe
(EPIPE in POSIX).
This is because when we're sending in chunks worth 4k bytes,
some of the packets generated for the stream manage to become
"in flight" before the server's side of the connection manages
to propagate the information on its closure to the client,
and those packets hit an already closed socket and get rejected
with the RST TCP flag set.
In the second example an attempt to send another chunk of data
sees that the client side already knows that the connection
has been teared down and fails the sending without "touching
the wire".
TL;DR, the bottom line
Welcome to the wonderful world of networking. ;-)
I'd recommend buying a copy of "TCP/IP Illustrated",
read it and experiment.
TCP (and IP and other protocols above IP)
sometimes works not like people expect them to by applying
their "common sense".

Close the goroutine reading from a TCP connection without closing connection

I love the way Go handles I/O multiplexing internally which epoll and another mechanisms and schedules green threads (go-routine here) on its own giving the freedom to write synchronous code.
I know TCP sockets are non-blocking and read will give EAGAIN when no data is available. Given that, conn.Read(buffer) will detect this and blocks the go routine doing a connection read with no data available in the socket buffer. Is there a way to stop such go routine without closing the underlying connection. I am using a connection pool so closing the TCP connection won't make sense for me and want to return that connection back to the pool.
Here is the code to simulate such scenario:
func main() {
conn, _ := net.Dial("tcp", "127.0.0.1:9090")
// Spawning a go routine
go func(conn net.Conn) {
var message bytes.Buffer
for {
k := make([]byte, 255) // buffer
m, err := conn.Read(k) // blocks here
if err != nil {
if err != io.EOF {
fmt.Println("Read error : ", err)
} else {
fmt.Println("End of the file")
}
break // terminate loop if error
}
// converting bytes to string for printing
if m > 0 {
for _, b := range k {
message.WriteByte(b)
}
fmt.Println(message.String())
}
}
}(conn)
// prevent main from exiting
select {}
}
What are the other approaches can I take if it's not possible:
1) Call syscall.Read and handle this manually. In this case, I need a way to check if the socket is readable before calling syscall.Readotherwise I will end up wasting unnecessary CPU cycles. For my scenario, I think I can skip the event based polling thing and keep on calling syscall.Read as there always be data in my use case.
2) Any suggestions :)
func receive(conn net.TCPConn, kill <-chan struct{}) error {
// Spawn a goroutine to read from the connection.
data := make(chan []byte)
readErr := make(chan error)
go func() {
for {
b := make([]byte, 255)
_, err := conn.Read(b)
if err != nil {
readErr <- err
break
}
data <- b
}
}()
for {
select {
case b := <-data:
// Do something with `b`.
case err := <-readErr:
// Handle the error.
return err
case <-kill:
// Received kill signal, returning without closing the connection.
return nil
}
}
}
Send an empty struct to kill from another goroutine to stop receiving from the connection. Here's a program that stops receiving after a second:
kill := make(chan struct{})
go func() {
if err := receive(conn, kill); err != nil {
log.Fatal(err)
}
}()
time.Sleep(time.Second)
kill <- struct{}{}
This might not be exactly what you're looking for, because the reading goroutine would still be blocked on Read even after you send to kill. However, the goroutine that handles incoming reads would terminate.

Go Unix Domain Socket: bind address already in use

I'm having the following server code, which listens via unix domain socket
package main
import (
"log"
"net"
"os"
"os/signal"
"syscall"
)
func echoServer(c net.Conn) {
for {
buf := make([]byte, 512)
nr, err := c.Read(buf)
if err != nil {
return
}
data := buf[0:nr]
println("Server got:", string(data))
_, err = c.Write(data)
if err != nil {
log.Fatal("Writing client error: ", err)
}
}
}
func main() {
log.Println("Starting echo server")
ln, err := net.Listen("unix", "/tmp/go.sock")
if err != nil {
log.Fatal("Listen error: ", err)
}
sigc := make(chan os.Signal, 1)
signal.Notify(sigc, os.Interrupt, syscall.SIGTERM)
go func(ln net.Listener, c chan os.Signal) {
sig := <-c
log.Printf("Caught signal %s: shutting down.", sig)
ln.Close()
os.Exit(0)
}(ln, sigc)
for {
fd, err := ln.Accept()
if err != nil {
log.Fatal("Accept error: ", err)
}
go echoServer(fd)
}
}
When I close it using Ctrl+C, then the signal is captured and the socket is closed. When I rerun the program, everything works fine.
However, if the running process is abruptly killed, and if the program is restarted, the listen fails with the error Listen error: listen unix /tmp/go.sock: bind: address already in use
How to graciously handle this?
The reason why I ask this is: I know that abruptly killing is not the normal method, but my program shall be launched automatically as a daemon and if my daemon is restarted, I want to be able to listen to the socket again without this error.
It could also be because of a prior instance running, which I understand. The question here is how to programmatically identify in Go and handle this situation. As pointed in the answer here, one can use SO_REUSEADDR in C programs. Is there such a possibility in Go? Also, how do C programs handle this multiple instance problem.
You need to catch the signal and cleanup; some example code:
func HandleSIGINTKILL() chan os.Signal {
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
return sig
}
...
go func() {
<-HandleSIGINTKILL()
log.Info("Received termination signal")
// Cleanup code here
os.Exit(0)
}()
This will of course not work if you kill -9 the process; you will need to manually remove the socket (or have your init system do it for you).

Re-creating mgo sessions in case of errors (read tcp 127.0.0.1:46954->127.0.0.1:27017: i/o timeout)

I wonder about MongoDB session management in Go using mgo, especially about how to correctly ensure a session is closed and how to react on write failures.
I have read the following:
Best practice to maintain a mgo session
Should I copy session for each operation in mgo?
Still, cannot apply it to my situation.
I have two goroutines which store event after event into MongoDB sharing the same *mgo.Session, both looking essiantially like the following:
func storeEvents(session *mgo.Session) {
session_copy := session.Copy()
// *** is it correct to defer the session close here? <-----
defer session_copy.Close()
col := session_copy.DB("DB_NAME").C("COLLECTION_NAME")
for {
event := GetEvent()
err := col.Insert(&event)
if err != nil {
// *** insert FAILED - how to react properly? <-----
session_copy = session.Copy()
defer session_copy.Close()
}
}
}
col.Insert(&event) after some hours returns the error
read tcp 127.0.0.1:46954->127.0.0.1:27017: i/o timeout
and I am unsure how to react on this. After this error occurs, it occurs on all subsequent writes, hence it seems I have to create a new session. Alternatives for me seem:
1) restart the whole goroutine, i.e.
if err != nil {
go storeEvents(session)
return
}
2) create a new session copy
if err != nil {
session_copy = session.Copy()
defer session_copy.Close()
col := session_copy.DB("DB_NAME").C("COLLECTION_NAME")
continue
}
--> Is it correct how I use defer session_copy.Close()? (Note the above defer references the Close() function of another session. Anyway, those sessions will never be closed since the function never returns. I.e., with time, many sessions will be created and not closed.
Other options?
So I don't know if this is going to help you any, but I don't have any issues with this set up.
I have a mongo package that I import from. This is a template of my mongo.go file
package mongo
import (
"time"
"gopkg.in/mgo.v2"
)
var (
// MyDB ...
MyDB DataStore
)
// create the session before main starts
func init() {
MyDB.ConnectToDB()
}
// DataStore containing a pointer to a mgo session
type DataStore struct {
Session *mgo.Session
}
// ConnectToTagserver is a helper method that connections to pubgears' tagserver
// database
func (ds *DataStore) ConnectToDB() {
mongoDBDialInfo := &mgo.DialInfo{
Addrs: []string{"ip"},
Timeout: 60 * time.Second,
Database: "db",
}
sess, err := mgo.DialWithInfo(mongoDBDialInfo)
if err != nil {
panic(err)
}
sess.SetMode(mgo.Monotonic, true)
MyDB.Session = sess
}
// Close is a helper method that ensures the session is properly terminated
func (ds *DataStore) Close() {
ds.Session.Close()
}
Then in another package, for example main Updated Based on the comment below
package main
import (
"../models/mongo"
)
func main() {
// Grab the main session which was instantiated in the mongo package init function
sess := mongo.MyDB.Session
// pass that session in
storeEvents(sess)
}
func storeEvents(session *mgo.Session) {
session_copy := session.Copy()
defer session_copy.Close()
// Handle panics in a deferred fuction
// You can turn this into a wrapper (middleware)
// remove this this function, and just wrap your calls with it, using switch cases
// you can handle all types of errors
defer func(session *mgo.Session) {
if err := recover(); err != nil {
fmt.Printf("Mongo insert has caused a panic: %s\n", err)
fmt.Println("Attempting to insert again")
session_copy := session.Copy()
defer session_copy.Close()
col := session_copy.DB("DB_NAME").C("COLLECTION_NAME")
event := GetEvent()
err := col.Insert(&event)
if err != nil {
fmt.Println("Attempting to insert again failed")
return
}
fmt.Println("Attempting to insert again succesful")
}
}(session)
col := session_copy.DB("DB_NAME").C("COLLECTION_NAME")
event := GetEvent()
err := col.Insert(&event)
if err != nil {
panic(err)
}
}
I use a similar setup on my production servers on AWS. I do over 1 million inserts an hour. Hope this helps. Another things I've done to ensure that the mongo servers can handle the connections is increate the ulimit on my production machines. It's talked about in this stack